This case study shows how we can use Bayesian Hidden Markov Models (HMMs) in Stan to extract useful information from basketball player tracking data. Specifically we show how to tag drive events and how to determine defensive assignment. The approach taken here is semisupervised since we know the outcome variables, but do not know the hidden states.
Before diving into basketball data we show how to fit a HMM in Stan using a simple example. This should help build some intution for those who are unfamiliar wit HMMs and will also show how to specify a HMM using the Stan probabilistic programming language.
HMMs enable you to model a series of observed values. Each observed value maps to a state value, so you also have a series of states that corresponds to the series of observed values. The state series exhibits the Markov property so the value of the state at time \(t\) only depends on the value of the state at time \(t-1\). Often the states are hidden so the goal of the model is to,
The plot below outlines a HMM with one sequence of observed outcomes generated from the normal distribution and two states. At each time step we are in one of the two states. Each of the states corresponds to a location parameter. The observed value is generated depending on which one of two location parameters is selected, which in turn depends on which state you are in. In more complicated data you could have multiple observations and many more states at each time step.
You can see how state 1 corresponds to smaller values of the outcome while state 2 corresponds to relatively higher values of the outcome. In most real-word situations you do not know the state value at each time step because it is hidden. So in order to fit the model and infer the state sequence you first need to make an assumption as to how many possible states there might be at each time step.
hmm_data <- readRDS("../data/hmm_example.RDS")
z <- hmm_data$z
y <- hmm_data$y
par(mfrow=c(2,1))
plot(hmm_data$z, type="s",
main = "Hidden States",
ylab = "State Value",
xlab = "Time",
ylim = c(0.5,2.5), yaxt = "n")
axis(2, 1:2, 1:2)
plot(hmm_data$y, type = "l",
main = "Observed Output",
ylab = "Observation Value",
xlab = "Time")
y_plt <- hmm_data$y
y_plt[hmm_data$z==1] <- NA
lines(y_plt, lwd = 3)
legend("bottomright", c("State 1","State 2"), lty = c(1,1), lwd = c(1,3), cex = 0.8)
Below is a representation of the HMM that generated the data above.
INSERT FIGURE
The parameters that we want to estimate in this model are the transition probabilities (\(\boldsymbol{\theta}\)) and the parameters associated with the emission probabilities (\(\mu_1\),\(\mu_2\)).
We can interpret an element in the transition matrix \(\boldsymbol{theta}\) as the probability of going from the row state to the column state from time \(t-1\) to time \(t\). So the diagonal elements give the probability of staying in the same state, and the off-diagonal elements give the probability of transitioning from one state to the next. Note that each row of the matrix sums to 1. Given our example we have,
Since the data in this example are normally distributed the emission probabilities come from the normal distribution. The emission probabilities depend on the location and scale parameter associated with each state (i.e. emission parameters). In our example we assume the scale parameter is known and only have to estimate the location parameters \(\mu_1\) and \(\mu_2\) for each state.
In order to estimate the parameters we need define the posterior distribution which requires us to specify the likelihood of the data and the priors. The likelihood is defined by the probability of observing that particular sequence of outcome variables. The priors are defined on the transition matrix and on the emission parameters. Since each row of the transition matrix sums to 1, a natural prior distribution choice is to define the Dirichlet distribution on each row of the matrix. Using the likelihood and the priors we can estimate the transition matrix and the emission parameters using the forward algorithm. Once we estimate the parameters we can determine the most probable state sequence than generated the sequence of observations. This can be acheived with the Viterbi algorithm.
Below we represent the HMM in Stan (also available in models/hmm_example.stan). The model {} block specifies the priors and the forward algorithm to determine the most likely state at each point in time, and the generated quantities {} block specifies the Viterbi algorithm to enable us to determine the most likely state sequence. This model was adapted from the Stan User’s Guide.
Notice that we have chosen to define \(\vec{\psi}\) as positive_ordered[K] psi instead of real psi[K]. This constraint is applied in order to enforce a strict separation between two \(\psi\) values associated with the two states. If we do not do this, and we do not have strict enough priors, then the sampling algorithm will struggle to find convergence among the parameter chains. An example of this is provided in models/hmm_example_bad.stan. (In some situations strict priors may not be sufficient and an ordering must be enforced.)
data {
int<lower=0> N;
int<lower=0> K;
real y[N];
}
parameters {
simplex[K] theta[K];
// real psi[K];
positive_ordered[K] psi;
}
model {
// priors
target+= normal_lpdf(psi[1] | 3, 1);
target+= normal_lpdf(psi[2] | 10, 1);
// forward algorithm
{
real acc[K];
real gamma[N, K];
for (k in 1:K)
gamma[1, k] = normal_lpdf(y[1] | psi[k], 1);
for (t in 2:N) {
for (k in 1:K) {
for (j in 1:K)
acc[j] = gamma[t-1, j] + log(theta[j, k]) + normal_lpdf(y[t] | psi[k], 1);
gamma[t, k] = log_sum_exp(acc);
}
}
target += log_sum_exp(gamma[N]);
}
}
generated quantities {
int<lower=1,upper=K> y_star[N];
real log_p_y_star;
{
int back_ptr[N, K];
real best_logp[N, K];
real best_total_logp;
for (k in 1:K)
best_logp[1, k] = normal_lpdf(y[1] | psi[k], 1);
for (t in 2:N) {
for (k in 1:K) {
best_logp[t, k] = negative_infinity();
for (j in 1:K) {
real logp;
logp = best_logp[t-1, j] + log(theta[j, k]) + normal_lpdf(y[t] | psi[k], 1);
if (logp > best_logp[t, k]) {
back_ptr[t, k] = j;
best_logp[t, k] = logp;
}
}
}
}
log_p_y_star = max(best_logp[N]);
for (k in 1:K)
if (best_logp[N, k] == log_p_y_star)
y_star[N] = k;
for (t in 1:(N - 1))
y_star[N - t] = back_ptr[N - t + 1, y_star[N - t + 1]];
}
}
Now we can fit the model to the data above in order to estimate the paramters theta and mu along with the hidden state sequence y_star.
stan_data <- list(N = length(hmm_data$y),
K = 2,
y = hmm_data$y)
hmm_fit <- stan("../models/hmm_example.stan", data = stan_data, iter = 1e3, chains = 4)
## Warning in strptime(xx, f <- "%Y-%m-%d %H:%M:%OS", tz = tz): unknown
## timezone 'zone/tz/2018i.1.0/zoneinfo/America/Los_Angeles'
The post estimation steps that we take to validate are,
For the transition probabilities \(\boldsymbol\theta{}\) and the parameters associated with the emission probabilities \(\vec{\psi}\) we have the following parameter estimates.
print(hmm_fit, pars = "y_star", include = FALSE, probs = c(0.05,0.95))
## Inference for Stan model: hmm_example.
## 4 chains, each with iter=1000; warmup=500; thin=1;
## post-warmup draws per chain=500, total post-warmup draws=2000.
##
## mean se_mean sd 5% 95% n_eff Rhat
## theta[1,1] 0.66 0.00 0.10 0.50 0.81 1633 1
## theta[1,2] 0.34 0.00 0.10 0.19 0.50 1633 1
## theta[2,1] 0.07 0.00 0.03 0.03 0.13 1758 1
## theta[2,2] 0.93 0.00 0.03 0.87 0.97 1758 1
## psi[1] 3.02 0.01 0.22 2.66 3.37 1263 1
## psi[2] 8.83 0.00 0.11 8.65 9.00 2043 1
## log_p_y_star -166.17 0.04 1.39 -168.96 -164.61 969 1
## lp__ -170.19 0.05 1.44 -172.99 -168.56 898 1
##
## Samples were drawn using NUTS(diag_e) at Sat Aug 17 03:17:56 2019.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
With regards to traceplots to monitor convergence we have the following. There is evidence of good mixing between the parameter chains.
mcmc_trace(as.array(hmm_fit), regex_pars = "^theta\\[|^psi\\[", facet_args = list(nrow = 2))
In the situation where priors are defined but an ordering is not enforced the lack of parameter convergence looks like the following (see hmm_example_bad_fit.R for code).
hmm_bad_stan <- readRDS("../results/hmm_example_bad.RDS")
mcmc_trace(as.array(hmm_bad_stan$fit), regex_pars = "^theta\\[|^psi\\[", facet_args = list(ncol = 2))
Below we plot 100 predicted outcome sequences given the predicted states and emission parameter estimates. Notice how the predictions line up nicely with the observed output values. This is one indication that the data generation process was appropriately modeled.
# extract samples
samples <- as.matrix(hmm_fit)
theta <- samples[,grep("^theta",colnames(samples))]
psi <- samples[,grep("^psi",colnames(samples))]
y_star <- samples[,grep("^y_star",colnames(samples))]
# simulate observations for each iteration in the sample
y_hat <- list()
for (i in 1:nrow(samples)) {
psi_seq <- sapply(y_star[i,], function(x){psi[i,x]})
y_hat[[i]] <- rnorm(length(psi_seq), psi_seq, 1)
}
# plot
indxs <- sample(length(y_hat), 100, replace = FALSE)
plot(hmm_data$y, type = "n",
main = "Observed vs Predicted Output",
ylab = "Observation Value",
xlab = "Time",
ylim = c(0,11))
for (i in indxs) {
lines(y_hat[[i]], col = "#ff668890")
}
lines(hmm_data$y, lwd = 2)
legend("bottomright", c("Observed","Predicted"), col = c("#000000","#ff668890"), lty = c(1,1), lwd = c(2,1), cex = 0.8)
Below we overlay the predicted states with the true states. Similar to the conclusion above, our predictions line up nicely with the true values. Note that we can only do this because we know the true value of the hidden states. In situations where the states are truly hidden this step is infeasible.
# visualization
par(mfrow=c(2,1))
plot(hmm_data$z, type="s",
main = "Latent States",
ylab = "State Value",
xlab = "Time",
ylim = c(0.5,2.5), yaxt = "n")
axis(2, 1:2, 1:2)
points(colMeans(y_star), cex = 0.5)
legend("bottomright", c("Actual","Predicted"), pch = c(NA,1), lty = c(1,NA), cex = 0.5)
plot(hmm_data$y, type = "l",
main = "Observed Output",
ylab = "Observation Value",
xlab = "Time")
y_plt <- hmm_data$y
y_plt[hmm_data$z==1] <- NA
lines(y_plt, lwd = 3)
legend("bottomright", c("State 1","State 2"), lty = c(1,1), lwd = c(1,3), cex = 0.8)
We can now adapt this methodology to identify a drive and defensive assignment in basketball player tracking data.
For those who are unfamiliar with basketball, a drive occurs when a player dribbles the ball towards the hoop for a shot attempt (often a layup). We can translate a drive into two types of events that are happening simultaneously over time until the shot,
The video below illustrates what a drive event looks like in the player tracking data (see drive_data.R for code). This drive possession was attributed to Zach LaVine in the Minnesota Timberwolves v Boston Celtics game on 12/21/2015.
Using the player tracking data we can construct the speed and distance metrics associated with LaVine’s drive event. We define speed as distance over time and use Euclidean distance to determine the player’s distance from the hoop at each time step. Below we show these metrics along with the player tracking data (see drive_data.R for code). Notice how he decreases his distance from the basket as he increases his speed as he drives to the hoop.
It is apparent that the speed metric is pretty noisy. This may be attributed to the fact that time is measured in 25 hertz (25ths of a second) and that location is determined by a computer vision algorithm and not a tracking chip attached to the player. They are many methods that can be used to smooth the data (e.g. splines). Here we use a basic rolling mean with a window of three time steps. In our example the data is not so noisy that it would affect the performance of our model so the smoothing is mostly for aesthetics and ease of interpretation. If the variance in the series was more extreme then we might want consider implementing a better smoothing method before fitting our model.
drive_data <- readRDS("../data/evt140_0021500411.RDS")
lavine_speed_smooth <- rep(drive_data$game$lavine_speed[2],2)
for (i in 3:nrow(drive_data$game))
lavine_speed_smooth[i] <- mean(drive_data$game$lavine_speed[(i-2):i], na.rm=TRUE)
drive_data <- readRDS("../data/evt140_0021500411.RDS")
par(mfrow = c(3,1))
plot(drive_data$game$lavine_dist, type = "l",
main = "Distance from Hoop",
xlab = "Time (25hz)", ylab = "Distance from Hoop")
plot(drive_data$game$lavine_speed, type = "l",
main = "Raw Speed",
xlab = "Time (25hz)", ylab = "Speed")
plot(lavine_speed_smooth, type = "l",
main = "Smooth Speed",
xlab = "Time (25hz)", ylab = "Speed")
Now that we have the transformed the data appropriately we can specify and fit the model.
Here our HMM needs to infer two hidden states: drive and none, and we have to model both the speed and distance observed sequences. An existing approach models 1/speed in order to get the speed and distance sequences to trend in the same way between the drive and non-drive state (Keshri et al 2017). In this case a drive would be defined by,
If the data is transformed in this way, a natural modeling approach for the emission probabilities would be to use the exponential distribution function. Unfortunately this poses two issues:
While it is mathematically more tractable to use the exponential distribution, we argue that such a choice to model the data generation process will be computationally unstable. Instead, we opt to use the normal distribution on both the 1/speed and distance metrics. This will enable us to model the data on the log scale.
With the normal distributions defined over the observed data, our HMM is defined as follows,
INSERT MODEL MATH
Below is a illustrative representation of the process that we are trying to model.
INSERT FIGURE
Finally, we fit the model to LaVine’s data for the single drive event (see drive_1.R for code). We provide the model fit diagnostics below.
drive_stan <- readRDS("../results/drive_1.RDS")
print(drive_stan$fit, pars = 'y_star', include = FALSE, probs = c(0.05,0.95))
## Inference for Stan model: drive_1.
## 4 chains, each with iter=1000; warmup=500; thin=1;
## post-warmup draws per chain=500, total post-warmup draws=2000.
##
## mean se_mean sd 5% 95% n_eff Rhat
## theta[1,1] 0.97 0.00 0.02 0.93 0.99 2335 1
## theta[1,2] 0.03 0.00 0.02 0.01 0.07 2335 1
## theta[2,1] 0.01 0.00 0.01 0.00 0.02 2216 1
## theta[2,2] 0.99 0.00 0.01 0.98 1.00 2216 1
## phi[1] -2.32 0.00 0.12 -2.51 -2.12 1162 1
## phi[2] -0.74 0.00 0.06 -0.83 -0.65 2105 1
## lambda[1] 2.38 0.00 0.12 2.19 2.57 1319 1
## lambda[2] 3.53 0.00 0.05 3.45 3.62 2119 1
## log_p_y_star -864.26 0.06 2.28 -868.61 -861.27 1538 1
## lp__ -887.59 0.06 1.75 -890.93 -885.30 828 1
##
## Samples were drawn using NUTS(diag_e) at Sat Aug 17 01:30:29 2019.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
mcmc_trace(as.array(drive_stan$fit), regex_pars = "^theta\\[|^psi\\[|^lambda\\[", facet_args = list(nrow = 2))
For comparison we implemented the exponential distribution version of the model in drive_0.R. In line with our theory we found that the parameter chains struggled to converge in some model runs, validating our approach to use the normal distribution to model the log-transformed data.
We post-process the state predictions and layer them on top of the original distance/speed plots to see if the predictions lined up with our logic.
samples <- as.matrix(drive_stan$fit)
y_star <- colMeans(samples[,grep("^y_star", colnames(samples))])
par(mfrow = c(3,1))
plot(drive_data$game$lavine_dist, type = "l",
main = "Distance from Hoop",
xlab = "Time (25hz)", ylab = "Distance from Hoop")
plot(lavine_speed_smooth, type = "l",
main = "Smooth Speed",
xlab = "Time (25hz)", ylab = "Speed")
plot(round(y_star), type = "l", pch = 1, cex = 0.5,
main = "Hidden States",
ylab = "State", xlab = "Time (25hz)",
ylim = c(0.5, 2.5), yaxt = "n")
axis(2, c(1,2), c("Drive", "None"), las = 2)
We can also layer the predicted latent states on top of the previous video for a more comprehensive view of how the state sequence behaves in relation to what is happening on the court.
It looks like things line up nicely. The drive event gets triggered only when the player dramatically reduces their distance from the basket and increases their speed over time and
Our final task is to determine defensive assignment in basketball using HMMs. We follow an approach similar to Franks et al (2015). However, instead of fitting the HMM using the E-M algorithm we fit the model in Stan. Before diving into the model we need to abstract what good basketball defense is, mathematically speaking. Arguably, good defensive positioning means that the defender is somehwere in the triangle defined by the location of the hoop, the ball, and the offensive player in question. An example of this is illustrated below.
h <- c(0,-1)
b <- c(0,1.5)
o <- c(-1,0.5)
d <- c(-0.3,0.3)
plot(rbind(h,b,o,h), type = "o",
xlim = c(-2,1), ylim = c(-2,2),
xlab = "", ylab = "", xaxt = "n", yaxt = "n",
pch = 20)
points(d[1],d[2], pch = 20)
text(h[1],h[2], "hoop", pos = 1, cex = 0.8)
text(b[1],b[2], "ball", pos = 3, cex = 0.8)
text(o[1],o[2], "offensive player", pos = 2, cex = 0.8)
text(d[1],d[2], "defender", pos = 3, cex = 0.8)
Using this concept, we model the defender’s position as 2 draws from the normal distribution where the location parameter is defined as a convex combination of the hoop, ball, and offensive player. Mathematically we can represent this for each time step as,
\[ y_{1}, y_{2} \sim \mathcal{N}(\mu_{k}, \sigma^{2}) \\ \mu_{k} = O_{k}\lambda_{1} + B\lambda_{2} + H\lambda_{3} \] where, at each time step, \(O_{k}\) is the position of the kth offensive player, \(B\) is the position of the ball, and \(H\) is the position of the hoop. Because \(\mu_{k}\) is a convex combination we also have the following constraint,
\[ \lambda_{1} + B\lambda_{2} + H\lambda_{3} = 1 \\ \Leftrightarrow \\ \Lambda \mathbf{1} = 1 \]
Defining the model with the state process \(z_t\) and the time steps gives the following,
\[ z_{t} \sim \mbox{Categorical}(\theta_{z_{t-1}}) \\ y_{t1}, y_{t2} \sim \mathcal{N}(\mu_{z_{t}}, \sigma^{2}) \\ \mu_{z_{t}} = O_{z_{t}}\lambda_{1} + B\lambda_{2} + H\lambda_{3} \\ \Lambda \mathbf{1} = 1 \]
For a single defender, this model determines which one of the five offensive players is being guarded at each time step.
Below is a graphical representation of the model when considering a single defender.
In stan we can represent the model as follows.
Before applying the model directly to the player tracking data we should check to see if our model performs appropriately with simulated data. Working in this type of controlled situation will also shed some intuition on what the model is doing.
The code here simulates an overly simplified basketball possession. For 20 time steps we sample a fixed position of five offensive players (o1 to o5) and the ball. (We do this with trivial noise to mimic the player tracking data.) We include the hoop location as a single coordinate as the position is fixed over time. With all these pieces in place we trace the path of a single defender in a parabola shape around the hoop.
defense_example <- readRDS("../data/defense_example.RDS")
list2env(defense_example, .GlobalEnv)
## <environment: R_GlobalEnv>
# plot
plt_defense_example(defense_example, main = "Defense Example")
points(d[,1], d[,2], col = "#ff6688", pch = 0)
Intuitively we want the model to say that the defender is guarding a particular offensive player if the defender is closest to the convex combination generated by that offensive player.
Below we fit the model to fake data assuming that the convex combination parameters are known and fixed to \(\Gamma = [1/3,1/3,1/3]\). This approach strongly biases the model to prior knowledge about where defenders are situated when guarding offensive players. Specifically it states that location (hoop, ball, and offensive player) is weighted equally in the convex combination so that the defensive player’s location does not favor one position over the other.
# fixing convex combination parameter
defense_0a_stan <- readRDS("../results/defense_0a.RDS")
print(defense_0a_stan$fit, pars = "y_star", include = FALSE, probs = c(0.05,0.95))
## Inference for Stan model: defense_0a.
## 4 chains, each with iter=1000; warmup=500; thin=1;
## post-warmup draws per chain=500, total post-warmup draws=2000.
##
## mean se_mean sd 5% 95% n_eff Rhat
## theta[1,1] 0.50 0.00 0.15 0.24 0.74 3044 1
## theta[1,2] 0.10 0.00 0.09 0.01 0.29 2944 1
## theta[1,3] 0.10 0.00 0.09 0.01 0.28 2637 1
## theta[1,4] 0.20 0.00 0.12 0.04 0.43 2750 1
## theta[1,5] 0.10 0.00 0.09 0.01 0.28 2360 1
## theta[2,1] 0.08 0.00 0.08 0.00 0.25 2506 1
## theta[2,2] 0.67 0.00 0.14 0.43 0.87 2804 1
## theta[2,3] 0.08 0.00 0.08 0.00 0.24 2648 1
## theta[2,4] 0.09 0.00 0.08 0.00 0.25 2064 1
## theta[2,5] 0.09 0.00 0.08 0.00 0.25 2144 1
## theta[3,1] 0.14 0.00 0.13 0.01 0.40 2513 1
## theta[3,2] 0.29 0.00 0.16 0.06 0.59 3165 1
## theta[3,3] 0.28 0.00 0.16 0.06 0.59 2429 1
## theta[3,4] 0.15 0.00 0.12 0.01 0.41 2546 1
## theta[3,5] 0.14 0.00 0.12 0.01 0.38 2539 1
## theta[4,1] 0.17 0.00 0.14 0.01 0.45 3057 1
## theta[4,2] 0.16 0.00 0.14 0.01 0.44 2723 1
## theta[4,3] 0.33 0.00 0.18 0.08 0.66 2760 1
## theta[4,4] 0.17 0.00 0.14 0.01 0.45 2091 1
## theta[4,5] 0.17 0.00 0.15 0.01 0.46 2327 1
## theta[5,1] 0.22 0.00 0.13 0.05 0.48 2959 1
## theta[5,2] 0.11 0.00 0.10 0.01 0.31 3528 1
## theta[5,3] 0.11 0.00 0.10 0.01 0.31 2730 1
## theta[5,4] 0.11 0.00 0.10 0.01 0.30 2846 1
## theta[5,5] 0.45 0.00 0.15 0.20 0.70 2273 1
## log_p_y_star -36396.56 0.05 2.48 -36400.96 -36392.85 2224 1
## lp__ -36451.65 0.14 3.57 -36458.16 -36446.37 682 1
##
## Samples were drawn using NUTS(diag_e) at Wed Aug 14 23:10:50 2019.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
The model below assumes that \(\Gamma\) is not known and has to be estimated from the data. We can still incorporate prior information about defensive positioning by applying the appropriate prior distribution on \(\Gamma\). However, this will be less strict compared to declaring \(\Gamma\) explicitly. This approach lets us learn about \(\Gamma\) from the data. This means that we can compute which positioning defenders favor when in the convex combination (i.e. to they tend to be closer to the ball, hoop, or offensive player).
# priors on convex combination parameter
defense_0b_stan <- readRDS("../results/defense_0b.RDS")
print(defense_0b_stan$fit, pars = "y_star", include = FALSE, probs = c(0.05,0.95))
## Inference for Stan model: defense_0b.
## 4 chains, each with iter=1000; warmup=500; thin=1;
## post-warmup draws per chain=500, total post-warmup draws=2000.
##
## mean se_mean sd 5% 95% n_eff Rhat
## theta[1,1] 0.50 0.00 0.15 0.24 0.75 3160 1.00
## theta[1,2] 0.10 0.00 0.09 0.01 0.29 2599 1.00
## theta[1,3] 0.10 0.00 0.09 0.01 0.29 2523 1.00
## theta[1,4] 0.20 0.00 0.12 0.04 0.42 2631 1.00
## theta[1,5] 0.10 0.00 0.09 0.01 0.29 2542 1.00
## theta[2,1] 0.11 0.00 0.10 0.01 0.31 2523 1.00
## theta[2,2] 0.56 0.00 0.16 0.28 0.80 2278 1.00
## theta[2,3] 0.11 0.00 0.10 0.01 0.30 2629 1.00
## theta[2,4] 0.10 0.00 0.09 0.01 0.29 2172 1.00
## theta[2,5] 0.11 0.00 0.10 0.01 0.32 2293 1.00
## theta[3,1] 0.13 0.00 0.11 0.01 0.37 2516 1.00
## theta[3,2] 0.25 0.00 0.15 0.05 0.52 2687 1.00
## theta[3,3] 0.38 0.00 0.16 0.13 0.66 2660 1.00
## theta[3,4] 0.13 0.00 0.11 0.01 0.36 2247 1.00
## theta[3,5] 0.12 0.00 0.11 0.01 0.34 2434 1.00
## theta[4,1] 0.09 0.00 0.08 0.00 0.26 2857 1.00
## theta[4,2] 0.09 0.00 0.09 0.01 0.27 2544 1.00
## theta[4,3] 0.18 0.00 0.12 0.03 0.41 3254 1.00
## theta[4,4] 0.54 0.00 0.15 0.29 0.78 2845 1.00
## theta[4,5] 0.09 0.00 0.08 0.01 0.25 2601 1.00
## theta[5,1] 0.33 0.00 0.17 0.08 0.64 2965 1.00
## theta[5,2] 0.16 0.00 0.13 0.01 0.43 2567 1.00
## theta[5,3] 0.17 0.00 0.14 0.01 0.45 2486 1.00
## theta[5,4] 0.17 0.00 0.14 0.01 0.44 2535 1.00
## theta[5,5] 0.17 0.00 0.14 0.01 0.46 2263 1.00
## lambda[1] 0.59 0.00 0.00 0.59 0.59 3987 1.00
## lambda[2] 0.41 0.00 0.00 0.41 0.41 4083 1.00
## lambda[3] 0.00 0.00 0.00 0.00 0.00 2751 1.00
## log_p_y_star -12562.30 0.07 3.29 -12568.11 -12557.46 2337 1.00
## lp__ -12641.39 0.15 3.73 -12648.08 -12635.79 638 1.01
##
## Samples were drawn using NUTS(diag_e) at Wed Aug 14 23:09:22 2019.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
For each model we plot the states associated with each position for the defender (these correspond to the offensive player numbers). Additionally, we plot the convex combination \(\mu_k\) for each offensive player. You can see how defensive assignment is related to how close the defender is to \(\mu_k\). In this example the difference between fixing and estimating \(\Lambda\) is apparent. When we estimate \(\Lambda\) the defender favors positioning themselves closer to the offensive player and hoop rather than the ball (this is reflected in larger values of \(\lambda\) for the offensive player and ball).
samples_0a <- as.matrix(defense_0a_stan$fit)
samples_0b <- as.matrix(defense_0b_stan$fit)
y_star_0a <- samples_0a[,grep("^y_star", colnames(samples_0a))]
y_star_0b <- samples_0b[,grep("^y_star", colnames(samples_0b))]
y_star_0a <- colMeans(y_star_0a)
y_star_0b <- colMeans(y_star_0b)
lambda <- samples_0b[,grep("^lambda", colnames(samples_0b))]
lambda <- colMeans(lambda)
# compute convex combination given fixed lambda
mu_0a <- list(mu1 = t(rbind(o[1,1,],h,b[1,])) %*% c(1/3,1/3,1/3) %>% t %>% c,
mu2 = t(rbind(o[2,1,],h,b[1,])) %*% c(1/3,1/3,1/3) %>% t %>% c,
mu3 = t(rbind(o[3,1,],h,b[1,])) %*% c(1/3,1/3,1/3) %>% t %>% c,
mu4 = t(rbind(o[4,1,],h,b[1,])) %*% c(1/3,1/3,1/3) %>% t %>% c,
mu5 = t(rbind(o[5,1,],h,b[1,])) %*% c(1/3,1/3,1/3) %>% t %>% c)
# compute convex combination given lambda estimate
mu_0b <- list(mu1 = t(rbind(o[1,1,],h,b[1,])) %*% lambda %>% t %>% c,
mu2 = t(rbind(o[2,1,],h,b[1,])) %*% lambda %>% t %>% c,
mu3 = t(rbind(o[3,1,],h,b[1,])) %*% lambda %>% t %>% c,
mu4 = t(rbind(o[4,1,],h,b[1,])) %*% lambda %>% t %>% c,
mu5 = t(rbind(o[5,1,],h,b[1,])) %*% lambda %>% t %>% c)
par(mfrow = c(1,2))
# model defense_0a
plt_defense_example(defense_example, main = expression(paste("Defense Example: ", Lambda, " Fixed")))
lambda_0a_txt <- paste(round(c(1/3,1/3,1/3),2), collapse = ",")
text(-2,1.9, bquote(paste(Lambda, " = [", .(lambda_0a_txt),"]")), pos = 4)
text(d[,1], d[,2], labels = paste(y_star_0a), col = "#ff668890")
text(mu_0a$mu1[1], mu_0a$mu1[2], expression(mu[1]), cex = 0.8)
text(mu_0a$mu2[1], mu_0a$mu2[2], expression(mu[2]), cex = 0.8)
text(mu_0a$mu3[1], mu_0a$mu3[2], expression(mu[3]), cex = 0.8)
text(mu_0a$mu4[1], mu_0a$mu4[2], expression(mu[4]), cex = 0.8)
text(mu_0a$mu5[1], mu_0a$mu5[2], expression(mu[5]), cex = 0.8)
# model defense_0b
plt_defense_example(defense_example, main = expression(paste("Defense Example: ", Lambda, " Estimated")))
lambda_0b_txt <- sprintf("%.2f", round(lambda,2), collapse=",")
lambda_0b_txt <- paste(lambda_0b_txt, collapse = ",")
text(-2,1.9, bquote(paste(Lambda %~~% phantom(), "[", .(lambda_0b_txt),"]")), pos = 4)
text(d[,1], d[,2], labels = paste(y_star_0b), col = "#ff668890")
text(mu_0b$mu1[1], mu_0b$mu1[2], expression(mu[1]), cex = 0.8)
text(mu_0b$mu2[1], mu_0b$mu2[2], expression(mu[2]), cex = 0.8)
text(mu_0b$mu3[1], mu_0b$mu3[2], expression(mu[3]), cex = 0.8)
text(mu_0b$mu4[1], mu_0b$mu4[2], expression(mu[4]), cex = 0.8)
text(mu_0b$mu5[1], mu_0b$mu5[2], expression(mu[5]), cex = 0.8)
Now that we are confident with our model we can apply it to the player tracking data. We continue with the data from the previous section which is based on a drive possession. This defines an easy to work with possession where the Minnesota Timberwolves are the offensive team and the Boston Celtics are the defensive team.
So far the models in this section have shown how to infer defensive assignment for only one defender. In order to apply this to possesion level player tracking data we need to determine defensive assignment for each of the 5 defenders. We implement this in Stan by adding an extra dimension \(D=5\) to the observed data \(y\) and wrap the forward algorithm and the Viterbi algorithm in a loop to determine defensive assignment for each of the 5 defenders. Now the dimensions of \(y\) are (number of defenders, number of states (offenders), number of time steps, number of coordinates (2)), which translates to (5,5,N,2) where N is defined by the length of the possession.
Below we fit HMM to the player tracking to data.
Finally, we can visualize what defensive assignment looks like when inferred from this HMM.
Talk about how labeling the data would make the HMM parameters consistent with when the drive actually occurs. Talk about pairing HMM with computer vision algorithms to apply meta data to the drive event regarding the type of shot (floater, bank, finger-roll, etc).